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IN TH^UNITED STATES PATENT AND TRADEMARK OFFICE 
Steger Examiner: Jon Chang 



Patentee: 



Patent Number: 



US 7,062,093 B2 



Group Art Unit: 



2623 



Issue Date: 



June 13,2006 



Title: 



SYSTEM AND METHOD FOR OBJECT RECOGNITION 



STATEMENT OF FILING BY EXPRESS MAIL 37 C.F.R. SECTION 1.10 
This correspondence is being deposited with the United States Postal Service on 
March 22, 2007 in an envelope as "Express Mail Post Office to Addressee" Mail Label 
Number ER 059 678 242 US addressed to the Commissioner for Patents, P.O. Box 1450, 
Alexandria, VA 22313-1450. 



ATTN: Certification of Correction Branch 

Commissioner for Patents 

P.O. Box 1450 

Alexandria, VA 22313-1450 

REQUEST FOR EXPEDITED ISSUANCE OF A 
CERTIFICATE OF CORRECTION - ERROR ATTRIBUTABLE TO THE OFFICE 

The patentee requests the issuance of a Certificate of Correction in connection with the 
above-identified patent, as per the attached Forms PTO/SB/44, in which the following revisions 
are requested: 

On the cover page, in the inventor's name, 
"Carstan" should read -Carsten--. 

Column 2, line 5, "(}»" should read ~ cp ~. 

Column 2, line 21, "(j)" should read -- <f> -. 

Column 2, line 24, "(j)" should read - (p -. 
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Column 10, lines 10-14, the lower portion of the equation 
after the second should read: 



Column 1 1, line 55, " (|)min (t>max " should read: 

COmin S (0 < COmax 

Column 13, line 6, "<t)" should read (p --. 
Column 14, line 40, "Ao)" should read -- Acp --. 
Column 16, line 25, "(j)" should read -- (p --. 
Column 16, line 44, "(j)" should read - (p --. 
Column 16, line 46, "(j)" should read -- (p 
Column 18, lines 1-4, the equation should read: 



Column 2 1 , line 31," ^rmn <^< <t>max " should read 

COmin < CO < COmax 

Column 22, line 48, "(|)" should read (p --. 
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REMARKS 

The present U.S. patent US 7,062,093 B2 was filed as a non-provisional utility 
application on November 26, 2001 as U.S. application number 09/965,236. In the U.S. patent 
US 7,062,093 B2 granted therefrom on June 13, 2006, numerous errors were printed in the 
issued patent, as set forth herein and in the accompanying Forms PTO/SB/44. 

It is respectfiilly submitted that all of the errors set forth herein and in the accompanying 
Forms PTO/SB/44 were due solely to an office mistake of the U.S. Patent and Trademark Office. 

Accordingly, no fees are due. 

Regarding the name of the inventor, there is only one inventor, and the correction to the 
name of the sole inventor is of a typographical nature, which does not affect the inventorship of 
the issued patent. Support for the correct first name of the inventor to be "Carsten" is clearly 
shown on the most up-to-date Corrected Filing Receipt dated May 2, 2006, as well as the Notice 
of Allowance and Fee(s) Due, dated May 9, 2005 (copies enclosed). 

As to the text of the issued patent, such errors and discrepancies in the issued patent 
compared to the text of the application as originally filed by the applicant are clearly disclosed in 
the records of the U.S. Patent and Trademark Office, as shown in supporting documentation, 
being copies of the relevant pages of the application as originally filed, attached as an appendix 
to the present request. The correct text is indicated in red on the attached pages of the 
application as originally filed, that is: 

at page 2, line 20; 

at page 3, lines 8 and 10; 

at page 13, line 20; 

at page 16, line 25; 
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at page 19, line 18; 

at page 23, line 1; 

at page 26, line 22; 

at page 27, lines 12 and 13; 

at page 30, line 1; 

at page 36, line 8; and 

at page 38, line 4. 

Accordingly, expedited issuance of a Certificate of Correction to correct the aforesaid 
errors and discrepancies is respectfully requested. 

Since the aforesaid errors and discrepancies in the issued patent were solely due to an 
Office Mistake, no fees are required, as per 35 U.S.C. § 254 and 37 C.F.R. § 1.322(b). 

In case of any deficiencies in fees by the filing of the present Request for Certificate of 
Correction, the Commissioner is hereby authorized to charge such deficiencies in fees to Deposit 
Account Number 01-0035. 



Respectfiilly submitted, 




Date: March 22, 2007 



Anthony J. Natoli 
Registration number 36,223 
Attorney for patentee 



ABELMAN, FRAYNE & SCHWAB 
666 Third Ave., 10th Floor 
New York, NY 10017-5621 
(212) 949-9022 
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INVENTOR(S) : Qarsten Steger 

It is certified that an error appears or errors appear in the above-identified patent and that said Letters Patent 
is hereby corrected as shown below: 



On the cover page, in the inventor's name, 

"Carstan" should read — Carsten-. 

Column 2, line 5, "(j)" should read (p ~. 

Column 2, line 21, "(j)" should read -- (p --. 

Column 2, line 24, "(})" should read - (p ~. 

Column 10, lines 10-14, the lower portion of the equation 
after the second should read: 
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ABELMAN. FRAYNE & SCHWAB 
666 Third Ave., 10th Floor 
New York, NY 10017-5621 

This collection of information is required by 37 CFR 1.322. 1.323. and 1.324. The Information Is required to obtain or retain a benefit by the public which is to file 
[ar^d^vThe USpIo t^ an appli^ion. Confidentiality is governed by 35 U.S.C. 122 and 37 CFR 1.14. This «llect.on .s estimated to take 1.0 hour to 

Me Jnduding gathering, preparing, and submitting the completed application form to the USPTO. Time >mII 

comments on the amount of time you require to complete this form and/or suggestions for reduang this burden should be sent to ^h.^ ^L^^ 
U.S. Patent and Trademark Office. U.S. Department of Commerce. P.O. Box 1450 Alexandria. VA 22313-1450^ DO NOT SEND FEES OR COMPL^^^^^^^ 
FORMS TO THIS ADDRESS. SEND TO: Attention Certificate of Corrections Branch, Commissioner for Patents, P.O. Box 1450, Alexandria, 
VA 22313-1450. 

If you need assistance in completing the form, call 1-800-PrO-9199 and select option 2. 
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PATENT NO. : US 7,062,093 
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INVENTOR(S) : Qarsten Steger 

It is certified that an error appears or errors appear in the above-identified patent and that said Letters Patent 
is hereby corrected as shown below: 



Column 11, line 55, 


" ^min < <t> ^ <t>max " 


should read; 


COmin < CO < CO 


max 




Column 13, line 6, " 


(|)" should read — (p --. 


Column 14, line 40, 


"Aco" should read 


-- A(p 


Column 16, line 25, 


"(j)" should read - 




Column 16, line 44, 


"(j)" should read - 


cp--. 


Column 16, line 46, 


"(j)" should read -- 


CP". 
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Column 18, lines 1-4, the equation should read: 
1 
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in the image. If the interior orientation of the camera is unicnown, a perspective 
projection between two planes (i.e., the surface of the object and the image plane) 
can be described by a 3x3 matrix in homogeneous coordinates: 



Pu Pn Pi3 
Pii P-a Pn 
\^Pi\ Pi2 P33J 



5 The matrix and vectors are only determined up to an overall scale factor (see Hartley 
and Zissennan (2000) [Richard Hartley and Andrew Zlsserman: Multiple View 
Geometry in Computer Vision. Cambridge University Press, 2000], chapters 1.1- 
1.4). Hence, the matrix, which detemnines the pose of the object, has eight degrees 
of freedom. If the interior orientation of the camera is known, these eight degrees of 

10 freedom reduce to the six degrees of freedom of the pose of the object with respect 
to the camera (three for translation and three for rotation). 

Often, this type of transfonnation is approximated by a general 2D affine 
transfonnation, i.e., a transfonnation where the output points {x',yyare obtained 
from the input points ix,yy by the following fonnula: 



\}yj 



General affine transformations can, for example, be decomposed into the following, 
geometrically intuitive, transformations: A scaling of the original x and y axes by 
different scaling factors and s^, a skew transformation of the y axis with respect 
to the x axis, i.e., a rotation of the y axis by an angle 0 , while the x axis is kept fixed, 
a rotation of both axes by an anglejQ and finally a translation by a vector {t^j^Y . 
Therefore, an art>itrary affine transfonnation can be written as: 



fx'^ fcos(p -sincpYl -sineYj, OY^l 



sinq) cosq) ji^O cosG 
Figure 1 displays the parameters of a general affine transfonuation graphically. Here, 
a square of side length 1 is transfonned into a parallelogram. Similarity 
transformations are a special case of affine transfonmations in which the skew angle 
5 9 is 0 and both scaling factors are identical, i.e., s^=Sy=s. Lil<ewise, rigid 
transformations are a special case of similarity transfonmations in which the scaling 
factor is 1, i.e., s = \. Finally, translations are a special case of rigid transfomriations 
in whicl|^^ 0 . The relevant parameters of the class of geometrical transformations 
will be refen-ed to as the pose of the object in the image. For example, for rigid 
10 transformations the pose consists of the rotation angl|^and the translation vector 
{t^,tyY . Object recognition hence is the detennination of the poses of all instances 
of the model in the image. 

Several methods have been proposed in the art to recognize objects in images. Most 
of them suffer from the restriction that the model will not be found in the image if it is 
15 occluded or degraded by additional clutter objects. Furthemriore, most of the existing 
methods will not detect the model if the image exhibits non-linear contrast changes, 
e.g., due to illumination changes. 

All of the known object recognition methods generate an intemal representation of 
the model in memory at the time the model is generated. To recognize the model in 
20 the image, in most methods the model is systematically compared to the image 
using all allowable degrees of freedom of the chosen class of transfomnations for the 
pose of the object (see, e.g., Borgefors (1988) [Gunilla Borgefors. Hierarchical 
chamfer matching: A parametric edge matching algorithm. iEEE Transactions on 
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The advsintage of this match metric is that neither the model image nor the image in 
which the model should be recognized need to be segmented (binarized). i.e.. it 
suffices to use a filtering operation that only retums direction vectors instead of an 
extraction operation which also segments the image. Therefore, if the model is 
generated by edge or line filtering, and the image is preprocessed in the same 
manner, this match metric fulfills the requirements of robustness to occlusion and 
clutter. If parts of the object are missing in the image, there are no lines or edges at 
the corresponding positions of the model in the image, i.e.. the direction vectors 
will have a small length and hence contribute little to the sum. Likewise, if 
there are clutter lines or edges in the image, there will either be no point in the model 
at the clutter position or it will have a small length, which means it will contribute little 
to the sum. Therefore, the above match metric reflects how well the points in the 
image and model that correspond to each other align geometrically. 
However, with the above match metric, if the image brightness is changed, e.g.. by a 
constant factor, the match metric changes by the same amount. Therefore, it is 
preferred to modify the match metric. By calculating the sum of the nomialized dot 
product of the direction vectors of the transfomned model and the image over all 
points of the model, i.e.: 

Because of the nomalizatioTTfte direcUon vectors, this match metric is 
additionally invariant to ari^itrary illumination changes. In this preferred embodiment 
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•All three normalized . match metrics have the property that they retum a number 
smaller than 1 as the score of a potential match. In all cases, a score of 1 indicates a 
perfect match between the model and the image. Furthermore, the score roughly 
corresponds to the portion of the model that is visible in the image. For example if 
5 the object is 50% occluded, the score cannot exceed 0.5. This is a highly desirable 
property because it gives the user the means to select an intuitive threshold for when 
an object should be considered as recognized. 

Since the dot product of the direction vectors is related to the angle the direction 
vectors enclose by the arc cosine function, other match metrics could be defined that 

10 also capture the geometrical meaning of the above match metrics. One such metric 
is to sum up the absolute values of the angles that the direction vectors in the model 
and the direction vectors in the image enclose. In this case, the match metric would 
retum values greater or equal to zero, with a value of zero indicating a perfect match. 
In this case, the pose of the model must be detennined from the minimum of the 

15 match metric. 

Object Recognition Method 

To find the object in the image, the a-priori unbounded search space needs to be 
bounded. This is achieved through the user by setting thresholds for the parameters 
20 of the search space. Therefore, in case of affine transformations the user specifies 
thresholds for the two scaling factors, the skew angle, and the rotation angle: 
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. . e.3:. the Gaussian After used in the Canny edge extractor and the Steger line 
detector. 

The transformation space needs to be discrefeed in a manner «.at the al>ove 
requirement of all model points lying at most k pixels from the instance in the image 
5 can ^ ensure. F,gu« 3 shows a sample model of a key along wrth tt,e parameters 
that are used to derive the discretization step lengths. The point o is the reference 
point of the model, e.g., center of grav^. The distance is the largest distance 
of all model points from the reference point. The distance is the largest distance 
of all model points from the .ference point measured in the x direction only, i.e.. 
10 only the x coordinates of the mode, points are used to measure the distances. 
LiRewise, the distance is largest distance of all mode, points from the reference 
point measured ,n the y d„.ct,on on,y. To ensure that all model points created by 
scaling «,e model in the x direction lie wtthin k pixels from the instance ,n the image, 
the step length ^. must be chosen as ^.-k^, . Likewise, A., must be ohosen 
15 as A., = kjd, . The discretization of the skew angle only depends on the distance 
slice *e X axis remains fixed in a skew operaflon. Hence the step length of the 
Skew angle A9 must be chosen as A9 =a«cos(l-*7P^J))- SWarV. the step 
ieng* of the rotation angle must be chosen as^axcoosa-^VP^^L))- "nally, 
the step lengths in the translation parameters must both be equal to *, i.e., 
20 At=At=k. 
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For a further speed-up. the number of points in the model is preferably also reduced 
by a factor k = 2' for the different discretization levels of the search space (see the 
section on model generation below). The necessary smoothing is preferably 
5 obtained by using a mean filter, because rt can be implemented recursively, and 
hence the runtime to smooth the image does not depend on the smoothing 
parameter. Alternatively, a recursK/e implementation of the Gaussian smoothing filter 
can be used. Although this discretization and smoothing method already yields 
acceptable runtimes, a further speed up can be obtained by also subsampling the 
10 image by a factor identical to the translation step lengths Ar, and Ar, on each 
discretization level. In this case, the step lengths for the translation parameters will, 
of course, have to be set to 1 in each level. Now. however, care must be taken to 
propagate the correct translation parameters through the levels of discretization. If. 
as described above, the translation step lengths double in each level of the 
15 discretization space, the translation parameters of the found objects must be 
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• below in the section on model generation, the transfonnecl models may be 
precomputed at the time the model is generated. If this was not done, the model 
must be transformed in this step by applying the affine transformation parameters to 
the points of the model and the linear transfomiation parameters to the direction 
5 vectors of the model. This results in a score for each possible combination of 
parameter values. The scores are then compared with the user-selected threshold 
w . . All scores exceeding this threshold are combined into regions in the search 

nun ' 

space. In these regions, local maxima of the match metric are computed by 
comparing the scores of a certain set of parameters with scores of neighboring 
10 transfomnation parameters. The resulting local maxima con-espond to the found 
instances of the model in the coarsest discretization of the search space. These 
found instances are inserted into a list of instances, which is sorted by the score of 
the instance. 

Once the exhaustive match on the coarsest discretization level is complete, the 
15 found instances are tracked through the finer levels of the discretization space until 
they are found at the lowest level of the discretization space (step 5), The tracking is 
perfonned as follows: The first unprocessed model instance is removed from the list 
of model instances. This is the unprocessed instance with the best score, since the 
list of instances is sorted by the score. The 'pose parameters of this instance are then 
20 used to define a search space in the next lower level of the discretization. Ideally, the 
model would be located at the position given by the appropriate transfonnation of the 
pose parameters, i.e., the scaling parameters and Sy as well as the angle^^ and 
e are rounded to the closest parameter in the next finer level of the discretization, 
while the translation parameters are either scaled by a factor of 2, if image pyramids 



are usad, or passed unmodrfiad, « subsampUng is no. used. However, since the 
instance has been found in a coarse disc^feation level in which the image has been 
smoothed by twfee the amount than in the next finer level, there Is an uncertainty in 
the pose parameter that must be taken into account when forming the search space 
in the next lower level of the disoretizaton, A good cho^ for the search space « 
obtained by constructing a rectangle of size 5x5 around «.e propagated translation 
parameters. Furthermore, the search space for the other four parameters ,s 
constn-cted by including the next lower and higher values of the parameters In the 
finer level into me search space. As an example, suppose the space of 
ttansfom,ations consists of the rigid transfom^ations, that image pyramids have been 
used, and the instance has been found in level / =3 c, the discretization w.h the 
following pose: (,.,,,) = (34.27,|5 55.020- . Then the search space in the finer level 
, = 2 is given by: 66.r..70, 52.,.56,and 52.716-057.300- (the table with 
the discretization step leng^s In the example above should be Rept In mind). The 
model is then searched witi, ai, transfom,atlons in «.e search space in the finer level 
by computing the match metric In the same manner as described above for the 
exhaustive match on ti,e coarsest level of discretization. The maximum score wrth,n 
the search space Is Identified. If the cor^sponding pose lies a. «.e border of the 
search space, «,e search space is fe^tive^ enlarged at that border until the pose 
0 with ti.e maximum score iles complete^ wimin ti.e search space, i.e., no, at «.e 
borders of the search space. If the maximum score thus obtained exceeds the user- 
selected threshold m^. the instance is added to «.e list of found instances in the 
appropriate place according to its score. 
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Wrth this approach, the model and image points must be extracted with subpixel 
precision If they are only extracted with pixel precision, the model points on average 
cannot be moved closer to the image points than approximately 0.25 pixels because 
5 of the discrete nature of the model and image points, and hence no improvement in 
the accuracy of the pose would result. However, even if the model and image points 
are extracted with subpixel precision the model and the image cannot be registered 
perfectly because usually the image and model points will be offset laterally, which 
typically results in a nonzero average distance even if the model and the found 
10 instance would align perfectly. Furthemiore. the traditional least-squares approach 
neglects the direction infomiation inherent in the model and the image. These 
shortcomings can be overcome by minimizing the distance of the image points from 
the line through the corresponding model point that is perpendicular to the direction 
stored in the model. For edges and lines, this line is parallel to the model edge or 
15 line. The line through the model point in the direction perpendicular to the model 
direction vector is given by d^ip-Pj) = d:(x-p^)^d^(y-pD = 0 . Therefore, the 
following distance would need to be minimized: 

It 6i^(r, -(^?, +r))/= J-Z^rC^' -97 -O^'l'X - 97 -'>)) ^ 

An approach of this type for detemiining a rigid transfomiation is described in 
20 Wallack and Maocha (1998) [Aaron Wallack and Dinesh Manocha. Robust 
Algorithms for Object Localization. International Journal of Computer V,s,on, 
27(3)-243-262, 1998]. This approach assumes that the correspondence problem has 
already been solved. Furthemiore, the model features are line segments and circular 
arcs instead of points and direction vectors. However, this approach is 
25 computationally inefficient because both the model points and their direction vectors 
need to be transformed. Approximately half of the operations can be saved rf instead 
the transformation from the image points to the model points is computed in the 
least-squares fit, i.e.. 
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d, the search space. If an image pyramid has been used in step (2) to transform the 
in,age this reductton of the number of points in «,e mode, happens automa«ca,iy. if 
«,e subsampling was not performed, the number of data points is reduced after the 
feature extraction in step (6) below. 

For each level of discetizaSon, «,e search space is sampted according to the 
discussion of the obieC recognKon method above, using user-specffied bounds on 
the linear transformation parameters: 



. Lr^ n„t samolsd i e fixed translation parameters. 
The translation parameters are not sampiea, i.e.. m 

0 are used, because the translation parameters do not change the shape of 
me rlodel. ^e steps (5H7) are perfom,ed for each set of parameters from the 
sampled search space for the current level of discetfcation. Tl,e reason for sampling 
the search space U tc precompute all possible shapes of the model under the 
allowable transfcrmascns and to store the in memory, leading to a signfficant 
reduction of njntime in the object recognition phase. 

,n step (5), the transfom,ed image of the current level cf discre^. i.e., the 
i„,age at the current level of *e pyramid or the appropriate^, smoothed image, which 
was generated in step (2), is transformed Wrth the current transfcm^ation parameters. 
Here care must be taken that the object still lies completely wrthin the in.age after 
0 the image transformation. If necessa^, a translation is added to the transfom-ation 
to achieve this, which is accounted for when the extracted model points are added to 
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the object caK be subsampled to generate a model with fewer model points, i.e.. only 
every k-th point of the contour is added to the model. 

Finally, the model points obtained in step (6) are added to the collection of models at 
the current discretization level, along with the transfomriation parameters that were 

5 used to generate the transfonned model. To make the matching more robust if a 
greedy search strategy is used, it is useful to add the points to the model in an order 
in which the first few points of the model are distributed well across the model. This 
is necessary in the greedy strategy because if all points in the first part of the model 
happen to be occluded, while all other points would be present, the matching 

10 strategy may not find the instance of the model. The simplest way to achieve an 
even distribution of the first model points is to add the model points to the model in a 
randomized order. 

The model generation strategy described above may generate a very large number 
of precomputed models of the search space of allowable transformations is large. 

15 This leads to the fact that the memory required to store the precomputed models will 
be very large, which either means that the model cannot be stored in memory or 
must be paged to disk on systems that support virtual memory. In the second case, 
the object recognition will be slowed down because the parts of the model that are 
needed in the recognition phase must be paged back into the main memory from 

20 disk. Therefore, if the memory required to store the precomputed models becomes 
too large, an alternative model generation strategy is to omit step (4) of the method 
above, and instead to compute only one precomputed model of the object at each 
level of discretization, corresponding to transfomiation parameters that leave the 
object unchanged, i.e.. = =1 ancg=e =0°. In this case, the transformation of 
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